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ABSTRACT 

Small RNAs (smRNAs) in plants, mainly microRNAs 
and small interfering RNAs, play important roles in 
both transcriptional and post-transcriptional gene 
regulation. The broad application of high- 
throughput sequencing technology has made rou- 
tinely generation of bulk smRNA sequences in 
laboratories possible, thus has significantly 
increased the need for batch analysis tools. 
PsRobot is a web-based easy-to-use tool dedicated 
to the identification of smRNAs with stem-loop 
shaped precursors (such as microRNAs and short 
hairpin RNAs) and their target genes/transcripts. It 
performs fast analysis to identify smRNAs with 
stem-loop shaped precursors among batch input 
data and predicts their targets using a modified 
Smith-Waterman algorithm. PsRobot integrates 
the expression data of smRNAs in major plant 
smRNA biogenesis gene mutants and smRNA- 
associated protein complexes to give clues to the 
smRNA generation and functional processes. 
Besides improved specificity, the reliability of 
smRNA target prediction results can also be 
evaluated by mRNA cleavage (degradome) data. 
The cross species conservation statuses and the 
multiplicity of smRNA target sites are also 
provided. PsRobot is freely accessible at http:// 
omicslab.genetics.ac.cn/psRobot/. 

INTRODUCTION 

MicroRNAs (miRNAs) and small interfering RNAs 
(siRNAs) are two major classes of endogenous regulatory 
small RNAs (smRNAs) in plants. They are usually 21-24 



nucleotides (nt) long, and both function by pairing to 
targets via sequence complementarity (1). miRNAs are 
usually generated from limited genomic loci and mainly 
work post-transcriptionally to down-regulate target 
mRNAs, whereas siRNAs have much broader origins 
and can function at both transcriptional and post-trans- 
criptional levels (2-6). Both miRNAs and siRNAs are typ- 
ically identified by cloning and sequencing of small size 
RNAs (7). The development and application of the 
high-throughput sequencing technology have significantly 
advanced the studies on smRNAs, but also imposed 
increasing numbers of laboratories facing the tasks of 
data analysis. 

Plant miRNAs are usually ~21 nt long and processed 
from the pairing stem region of longer precursors with 
hairpin-shaped secondary structures (8). The presence of 
stem-loop precursor as the lowest energy folding form has 
been considered as one of the key criteria for the identifi- 
cation of new miRNAs (9,10). However, as many genomic 
loci giving rise to siRNAs can also be folded into 
hairpin-shaped structures (11,12), searching for miRNAs 
by their expression evidence and the presence of stem-loop 
precursors may yield many false-positive candidates. 
According to the community-agreed plant miRNA anno- 
tation criteria (9,13), the presence of dominantly expressed 
candidate sequence and the detection of the pairing 
sequence (miRNA*) are required for miRNAs (13). 
These constraints improved the prediction specificity, but 
the results are still far from ideal. 

It has been shown that plant miRNAs mainly pair with 
their targets via nearly perfect sequence complementarity, 
very similar to the manner of siRNAs (9,13). This has 
made the predictions for plant miRNA targets relatively 
straightforward, yielding limited numbers of targets per 
miRNA rather than hundreds of ones in animals (9,13). 
Yet, it still requires large amount of work to identify real 
miRNA targets from the predicted candidate list. In 
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addition, increasing lines of evidence have shown that 
gaps are also tolerable in the pairing between plant 
miRNAs and their targets (14), which will enlarge the 
candidate target list and produce more difficulties for ex- 
perimental validation (15). 

Most of the available smRNA target prediction 
software was designed for animal miRNAs and not ideal 
for plant data because of the common presence of large 
bulges in animal miRNA and target alignments as well as 
the different contributions of miRNA seed sequences or 
central sequences to the stability of miRNA: target pairs 
of animals and plants. Several recently developed plant 
miRNA target prediction tools, such as targetFinder 
(16,17), psRNATarget (18) and CleaveLand (19), have 
provided great help to researches. However, they are 
also limited by the requirement of local installation, lack 
of degradome data supports or dependence on the target 
prediction results of third-party software (Supplementary 
Table SI). 

PsRobot is designed to partially solve the aforemen- 
tioned problems in plant miRNA and target prediction. 
It incorporates the commonly agreed criteria to identify 
smRNAs with stem-loop shaped precursors from user- 
uploaded sequences and predict their targets. The 
multiple user adjustable parameters made the software 
able to meet different needs from users. To facilitate 
better classification and functional analysis of input se- 
quences, psRobot integrates the expression information 
of input sequences in reported plant smRNA binding 
protein pull-down assay or mutants of major smRNA bio- 
genesis pathway genes. For example, strong association 
with ARGON AUTE1 (AGOl) protein and expression de- 
pletion in dell mutant will strengthen the confidence of a 
smRNA with stem-loop precursor as an miRNA (1). It 
also incorporates the available mRNA degradome data 
for users to evaluate the reliability of miRNA target pre- 
diction results. The multiplicity of miRNA binding sites 
on a single target as well as the cross species conservation 
status of the target sites are also analyzed and provided. 
PsRobot can be either used online or downloaded and 
installed locally. The local version offers a larger 
capacity for input data size and has the function to in- 
corporate user-uploaded degradome data. 

METHODS AND RESULTS 

The Stem-loop smRNA Prediction Function 
Input, algorithm and parameters 

The stem-loop smRNA prediction function takes input 
smRNA sequences in FASTA or plain text format. For 
each query smRNA, the software finds its perfectly 
matched genomic origins, and extracts various lengths of 
upstream and downstream sequences as precursors, 
assuming that the smRNA may originate from either the 
5' or 3' end of precursors, with 10 nt extension at one end 
of the precursor each time, till reaching the user defined 
precursor length. The secondary structures of the ex- 
tracted precursor sequences are then evaluated by the 
MFOLD program (20). Precursor sequences with 
stem-loop structure as the minimal free energy folding 



form and the corresponding query smRNA will be 
selected and reported in the result pages (Fig. la), either 
in html or text format. If the conservation analysis 
function of smRNA sequences is selected, the cross 
species conservation status of smRNAs will be analyzed 
by aligning the query small sequences to eight selected 
plant genomes using the BLAST program (allowing up 
to two mismatches in the smRNA sequences), and the 
ClustalW (21) alignment of the identified smRNA hom- 
ologous sequences will be included in the outputs (Fig lb). 
The repetitive sequence regions of the 26 preloaded 
genomes were identified by the RepeatMasker program 
(http://repeatmasker.org) and stored in the background 
database. Every genomic locus of the query sequences 
will be searched against the database to identify repeat 
sequence originated smRNAs. Parameters for users to 
adjust include smRNA conservation analysis, the 
minimal and maximal numbers of mismatched nucleotides 
within the query smRNA sequences in the obtained pre- 
cursor structures, the maximal lengths of extracted precur- 
sors, and the permission of large loop sequence in the 
qualified precursors. Although the precursor structures 
of most canonical plant miRNAs are very short and well 
paired, there are still some with large bulge or hairpin 
loops, such as ath-MIR393a and ath-MIR167d (22,23). 
Enable the "Retain large loop small RNA" function will 
include precursors with large loops in the prediction 
results. 

Data collection 

A preloaded species needs to be selected to define the 
origins of smRNAs. Up to 26 completed plant genomes 
are currently supported by psRobot, and future finished 
genomes will also be incorporated on their release. The 
Arabidopsis thaliana genome was downloaded from the 
Arabidopsis Information Resource (TAIR) (24) and the 
rice genome was from Rice Genome Annotation Project 
(RGAP) (25) and the Rice Annotation Project Database 
(RAP-DB) (26) databases. Other plant genomes were 
downloaded from the Phytozome genome database (27). 
SmRNA deep sequencing data in smRNA biogenesis 
mutants and AGO-associated libraries were collected 
from the NCBI Gene Expression Omnibus database 
with datasets GSE11094, GSE14695, GSE16959, 
GSE13605, GSE10036, GSE6682, GSE5343 and 
GSE6682 for Arabidopsis thaliana (28-35) and datasets 
GSE20748 and GSE18250 for rice (36,37). 

smRNA biogenesis and functional data 

PsRobot incorporates published smRNA sequencing data 
from major plant smRNA biogenesis and function 
associated protein complexes or gene mutants and 
returns this information together with the stem-loop pre- 
diction results. If a query sequence presents in any of the 
preloaded database, its sequence reads in each database 
will be listed (Fig. lc). As it has been shown that smRNAs 
of different origins are processed by different Dicer-like 
(DCL) family proteins and associate with different AGO 
protein complexes, such information will facilitate users to 
evaluate the types and functions of the query smRNAs. 
For example, decreased expression in dell mutant together 
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C Small RNA biogenesis mutants studies 



Genome: Arabidopsis tfialiana [Thale cress), TAIR release 10 from TAIR 
Number of genomic loci: 1 



Detaied loci information: 



Chromosome 



Si.ii 
3366349 



Name [ Type 
NoKcncirtive 



Number of stem-loop: 1 
Potential structures 

Location 1: Chr3 CHROMOSOME dumped from ADB: Jun/20/09 14:54; last updated: 2009-02-02: 
3366339-3366418 

This sequence is not from genomic repetitive region. 
Precursor sequence: 

tgtgcttctttgtctacaattttggaaaaagtgatgacgccattgctcttTCCCAAATGTAGACAAAGCAataccgtgat 

Energy: dG = -28.5 

Structure: 



Predict targets for this smalt RNA 

B Small RNA conservation study 
Number of conserved species: 5 
Detailed information: 



qs: the query species. 

+ : conserved. 

-: not conserved. 

Multiple homologous sequences alignment: 

Populustrichocarpa TACTAAATGTAGACAAAGCA 
Zeajnays T CCT AAATGT AG AC AAAGCA 

Carica_papaya - CCC AACTGT AG ACAAAGCA 

Oryza_sativa TCCG AACTGT AG AC AAAGCA 

query TCCCAAATGT AG AC AAAGCA 

Vitis Vinifera T CCC AAATGT AG AC AAGCC A 



arl-wr[dcil{ dd2 I dci3 [ dcM \ n 
Flower stages 1-12 
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Figure 1. Output of stem-loop smRNA prediction. (A) The genomic mapping and stem-loop precursor prediction of the query smRNAs, detailed 
information of genomic location, precursor sequence, secondary structure of each predicted stem-loop smRNA loci and their folding energies are 
included. Sequence in red and capitalized letters is the query smRNA sequence; (B) shows the conservation status of the smRNA in 8 plant species; 
(C) shows the normalized reads of a given smRNA in smRNA biogenesis mutants and AGO-associated libraries. 



with enrichment in the AGOl association libraries provide 
strong evidence for authentic miRNAs (1), whereas 
decreased expression in dcB mutant together with enrich- 
ment in the AG04 association libraries indicate the 
smRNAs as siRNAs (38). 

Performance 

To evaluate the accuracy of stem-loop smRNA prediction 
function, all miRBase (release vl7) (39) recorded 
Arabidopsis thaliana miRNAs were collected and used as 
the test dataset. Among the 213 nonredundant sequences, 
202 (~94%) miRNAs were successfully identified as 
stem-loop smRNAs. 

The smRNA Target Prediction Function 
Input and parameters 

To use the smRNA target prediction function, users can 
either select the known miRNAs from a plant species or 
submit their own sequences. The software will search for 
target sites among the pre-loaded genes/transcripts of the 
corresponding genome or the user-uploaded target library. 
The uploaded query smRNA sequences and target library 
should be either in FASTA or plain text format. The par- 
ameters for users to adjust include the following: (1) 
penalty score for the alignment between smRNAs and 
targets, which is defined by the formulas below; (2) the 
boundaries of essential sequence region, within which 



mismatches or gaps will receive double penalty scores 
than other regions; (3) the threshold for the total 
number of gaps within the smRNA and target alignment 
region; and (4) the region within which gaps are permitted. 
Degradome sequences mapped within the target sites will 
be analyzed and presented. Only preloaded degradome 
data are available for the online version of psRobot, yet 
users can incorporate their own degradome data via the 
psRobot_deg program once psRobot is installed locally. 

Algorithms 

As the pairings between smRNAs and target mRNAs 
involve global sequence alignment of smRNAs and local 
sequence alignment of targets, we applied a modified 
Smith-Waterman algorithm (Formula 1) (40) with the 
defined scoring system (Formula 2) to calculate the align- 
ment scores between the query smRNAs and targets. The 
penalty score of each candidate alignment is obtained by 
subtracting the actual alignment score from the ideal 
perfect global pairing score (Formula 3). Alignments 
meet the penalty score cutoff will be backtraced and 
reported in the result page (Fig. 2). Exhaustive search 
will be performed on each mRNA to search for the po- 
tential presence of multiple target sites (target multipli- 
city). Parallel computing method is used to accelerate 
the speed. A standalone local version of this function is 
also available for download to facilitate analysis on large 
datasets. 
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Target Prediction Result 

SRNA ID * Target ID Score 



Conservation 



Deg Bio KnownT 



1731 AGGTTTCC CT AGCG T AAC AGAGG 17C9 



ath-mtR393a 
MIMAT0000934 



AT3G23690.1 2.2 Query: 
Sbjct: 



1 TCCAAAGGGATCGCAT7GA7CC 22 
1 1 1 1 1 1 M 1 1 1 1 1 lllltll 

*26 CGGTTTCCCT AGCGAGAC TGGG *Cb 



ath-miR393a 
MIMAT0000934 



Penalty 



1 TCCA - AAGGGATCGCA7TGATCC 22 
III! I I 1 1 I I I Mill I II 
li6S AGGT3T-CCCTA-CGTAAA7TGG 



Chr3:23273116-23276375 RM 

LENGTH =2342 

Symbols: | basic helix- loop-he 
DNA-binding superfamily prote 
chr3:8528632-8531012 REVEF 
LENGTH =1774 
Symbols: FASlA | 
l-phosphatidylinositol-4-pnosphate 



Published evidence 
as miRNA targets 



6 X.419 N/A 



ath-m.R393a 
MIMAT0000934 



ath-rrvR393a 
MIMAT0000934 



ath-miR393a 
MIMAT0000934 



Query: 
Sbjct: 



1 TCCAAAGGGATCGC ATT GATCC 2 1 



Target site conservation 
within and cross-species 



2*C AGS -TT-CCTAGTG TAAC 7 AG A 221 



cV5:11112540-11116875 

LENGTH =4336 



Query: 
Sbjct: 



1 TCCAAAGGGATCGCATTGATC-C 22 
I I II I I I M I I I I I I It II I 
1607 AGG7TTCCCTAGCGT - ACCAGAG 15B6 



Sbjct: 



ath-miR393a AT1G09812.1 2.8 Query: 
MIMAT0000934 



1 1*1 ftt 



Symbols: GRH1, ATGRH1, AFB1 | 
GRRl-likc protem l | 
Chr4:1404B87-1407139 REVERSE 
LENGTH =2061 



ft GGGATC GC ATT GATCC 22 



CCCTAGC-TAA-ACGG 52< 



GGGAT CGCATTGATCC 22 



miRNA and target 
alignment 



:gtaagtagc Tt 




Symbols: | Glycosyl hydrolase family 38 
protein | chrS: 264390 13-26-"^2I_ 
REVERSE LENGTH=3144 
Symbols: | unknown protein 
Arabidopsis thaliana protein 
unknown protein (TAIR:AT1< 
Has 93 Blast hits to 93 prote 
species: Arcfiae - 0; Bacteria - 0; 
Metazoa - 0; Fungi - 0; Plants - 93; 
Viruses - 0; Other Eukaryot.es - 0 
(source: NCBI BUnk). | 
Chrl:3187817-3188580 FORWARD 
LENGTH =387 



Degradome data 
support 



Expression change in 
smRNA biogenesis gene 
mutants 



Search 
function 



vm 1 1 - iv v n 



Figure 2. Result summary table of smRNA target prediction function. 



Formula 1: 

S(Oj') = 0, 0 < j < n; S(i, 0) = 0, 0 < i < m; 



S(i,j) — max 



S[i-\][j- \] + co(q hrj ) 
S[i- \]\j\ + <q u -) 
S[i\\j — 1] — uimatch) + &>(—, rj) 



Target site conservation 

Homolog gene groups of eight plant species (Arabidopsis 
thaliana, Brachypodium distachyon, Carica papaya, Oryza 
sativa, Populus trichocarpa, Sorghum bicolor, Vitis vinifera, 



Match/ Mismatch /(G: Upair) 
Insertion 
Deletion 



■ , 0 < i < m, 0 < j < n 



notes: q represents smRNA sequences; r represents target 
sequences; m = length(q); n = length(r). 

Formula 2: 

w(match) = 3,ci>(mismatch) = —1,Q)(G : Upair) 
— \,co(deletion) = u>(insertion) = — 1.1 

notes: deletion represents gaps on query smRNAs; 
insertion represents gaps on candidate targets. 

Formula 3: 

P(m, j) = m x w{match) — S(m,j) 

Output of the smRNA target prediction function 

The primary output of the smRNA target prediction 
function is summarized in a sortable and searchable 
table (Fig. 2), with the query sequence, target alignment, 
alignment penalty score, target annotation, multiplicity of 
target sites and other supporting information. The 
contents of the expandable links in the result table are 
summarized in the next sections. 



Zea mays) are generated using OrthoMCL (41) with the 
default parameters, and serve as the source data for the 
conservation analysis of target sites. Predicted targets are 
searched against this source data for both paralogous and 
orthologous sequences with conserved target sites (Fig. 2). 
The multiple alignments of the conserved target sites 
and the alignments between smRNA and targets for 
homolog genes can be viewed via the hyperlink in the 
"Conservation" column (Fig. 3b and c). 

Degradome data 

It has been shown that the miRNA cleavage products of 
targets can be cloned and detected by high-throughput 
sequencing technology, generating mRNA degradome 
data among which the 5' ends of sequences mark the 
cleavage sites of miRNAs (28,37,42^4). PsRobot inte- 
grates well-produced datasets of Arabidopsis thaliana 
(GSE11094) and rice (GSE18248) in the target prediction 
results (Fig. 2) (28,37). The abundance of degradome se- 
quences (after normalized to reads per million, RPM) 
is marked at the starting genomic loci of each sequence 
(Fig. 3d). Candidate target sites with abundant degradome 
sequences are more likely to be authentic miRNA targets. 
The position with the most abundant degradome se- 
quences should represent the miRNA cleavage site. 
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Target Prediction Result 


Link tod ( 


SRNA ID % 


Target ID Score 


Alignment 


Annotation 


Mill 


Conservation 




tag 

1 Li- 




BiO 


KnownT 


<jtH-nR393a 
MIMAT0000934 


AT4G03190.1 


2.0 


Query: 1 TCCAAAGGGATCGCATTGATC-C 22 

llllllllllll III II II 1 
SbjCt: 160? AGGTTTCCCT AGCGT - ACCAGAG 1586 


Symbols: GRH1, ATGRH1, AFBl | 
GRRl-Uke protein 1 | 
cnr4:1404887-1407139 REVERSE 
LENGTH =2061 
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Total: 20 
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Start End Alignment 

1603 1625 TTGGCCAATGCTGCAAAGCTGGAGACAATGCGATCCCTTT GGAT GT CTT CTT GCTCTGT GAGT 

1524 1546 TTGGCCAATGCTGCAAAGCT GGAGACAAT SCGATCCCTTTGGAT GT CTT CTTGCT CTGT GAGT 

1627 1649 TTGGCCAATGCT GCAAAGCTGGAGACAATGCGATCCCTTT GGAT 3T CTT CTT GCT CAGT GAGT 

1760 1762 TTGGCCAATGCT GCAAAGCT GGAGACAAT GCGATCCCTTT GGAT GT CTT CTT GCTCAGT GAGT 

1709 1731 TTGGCCAATGCTTCAAAGCT GGAGACAAT GCGATCCCTTT GGAT GT CTT CI I GT T CCGTGAGT 

1990 2012 TTGC-CAAATGCAGCAAAGCTGGAGACAATGCGATCCCTTTGGAT GT CTGCATGTCAAGT GAGT 

1719 1741 CTGGCAAACGCTGCCAA&CT GGAGACAATGCGATCCCTTTGGAT GT CGACGT GCTCAAT GACC 

1551 1573 CTGGCAAAXGCT GCCAAGCT GGAGACAATGCGATCCCTTT GGAT GT CGACGT GCT CACT G AC C 

1763 1785 CTGGCAAATGCTGCCAAGCT GGAGACAATGCGATCCCTTTGGAT 3 1 CGT CAIGCT CGTT G AC C 

1695 1717 CTGGCGAACGCAGCAAAGCT GGAGACAAT GCGATCCCTTTGGAT G T CGT CGT GCTT GTT GAC C 

1586 1607 -TGGAGCACGCTGCCAAGCT AGAGACCAI GCGATCCCTTT GGAT GT CAT CTT GCTTTGT AAGT 

1856 1678 TTGGGAAATGTTGCCAAAT ATGACACAATGCGATCCCTTT GGAT GT CAT CAT GCAATGT CAC A 

1972 1994 TTGGGAAATGTTGCCAAAT ATGACACAATGCGATCCCTTT GGAT GT CAT CAT GCAATGT CACA 

1501 1522 -TGGGAAATGTT GCCAAGT ATGAGACAAT GCGATCCCTTT TGGAT GT CAT CAT GT GATGT CAC A 

1501 1522 -TGGAGAAT GT T GCCAAGT ATGAGACAAT GCGATCCCTTT GGAT GTCATCGTGCAATGTCACA 

2223 2244 -TGGGGAATTTT GCT AGGT ACGAGACAATGCGATCCCTTTGGAT GT CAT CTT GCAATGT CAC G 

1821 1E42 -TGGGGAATTT T GCT AGGT ACGAGACAAT GCGATCCCTT TGGAT GT CAT CTT GCAAT GT CACG 

ieB3 1904 -TTGCTGACGTGGGTAGGT ACGAAACAAT GCGATCCCTTTGGAT GT CGT CTT GTGAAGT AACA 

1998 2019 -TTGCT GATGT GAGCAAGT ATGAAACAATGCGATCCCTTTGGAT GT CTT CAT GTGAAGT CACA 

1981 2002 -TAAT GGACGTGGGAAAGT ATGAAACAATGCGATCCCTTT GGAT GT CAT CCTGCGACATT AC C 

1953 1974 -TAATGGACGTGGGAAAGT ATGAAACAAT GCGATCCCTTT GGAT GT CAT CGT GCGAAGTT AC C 

1273 1294 -TAGAGGACGT GGGAAAGT ATGAAACAAI GCGATCCCTTT GGAT GT CCT CCT GTGAAGTT ACT 

1583 1604 -TGACGGACGT GGGAAAGT ATGAGACAAT GCGATCCCTTT GGAT GT CGT CCT GTGAAGTT AC C 



Link to c 

D Degradome data of AT4G03190.1 







is a 31 n 



E Biogenesis data of AT4G03190.1 
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Link to e 



>ath-miR393a MIMAI0O0C934 Score: 1.0 
POPTR_QQ143l2930.1 position: 1603-1625 



Query: 
Sbjct: 



1 T CCAAAGGGATCGCATTG- AT CC 22 
1 1 1 1 II II II 1 1 1 1 1 1 1 1 III 
1625 AGGTTT Z Z CT AGCGT AA CAGAGG 1603 



>ath-miR393a MIMAT0000934 Score: 1.0 

evm. model. auperconcig_9. 171 position: 1524-1546 



Sbjct: 



1 TCCAAAGGGATCGCAITG-ATCC 22 
I II III III I II II I III III 
1546 AGGTTT CC CT AGCGT AACAGAGG 1524 



>ach-miR393a HIHAT0C00934 Score: 1.0 
GSVTVT0101 09950 01 position: 1627-1649 



1 TCCAAAGGGATCGCATTG-AICC 22 
I I I I II I II II f I I 111 I III 
1649 AGGTTT CCCT AGCGT AACAGAGG 1627 



Query: 
Sb]ct: 



Query: 
5b] ct: 



1 TCCAAAGGGATCGCATTG-ATCC 22 

in mi 1 1 1 1 1 1 1 1 1 1 1 in 

1731 AGSTTTCCCTAGCGTAACAGAGG 1709 



1 I CCAAAGGG AICGCATTG-AI CC 22 
I ill II I I I II I I I II I I III 
1573 AGGTTT CCCT AGCGT AACAGAGG 1551 



>ath-miR393a HIKAT0000934 Score: 1.0 Bradi2a35720. 1 
position: 1763-17S5 



Sbjct: 



1 T CCAAAGGGAT C GCATT G - AT C C 22 
1785 AGGTTTCCCT AGCGTAACAGAGG 1763 



>ath-miR393a MIHAT0000934 Score: 1.0 
LOC_Os05a05e00.1 position: 1695-1717 



Query: 
Sbicr: 



1 TCCAAAGGGATCGCATTG-AICC 22 
1717 AGGTTT CCCT AGCGTAACAGAGG 1695 



>ath-miR393a HIMATC000934 Score: 1.5 
GRM2M2G137451_T01 position: 1E56-1E78 



Query: 
Sbjct: 



1 TCCAAAGGGATCGCATTG-ATCC 22 
M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 
1E7E AGGTTT CC CT AGCGT AACACAGT 1B56 



Figure 3. Illustration of expandable links in the target prediction result table. (A) A sample line of the target prediction result table; (B) target site 
conservation analysis results. Homolog genes with the same smRNA target site, both within and cross-species, are shown in the ClustalW multiple 
alignment format; (C) the alignments between miRNA and targets provoked by mousing over the hyperlinks of gene IDs listed in (B); (D) 
Degradome data of the predicted smRNA target. Positions listed in the table are orders of nucleotides in the smRNA sequences, plus upstream 
and downstream 5nt each; 'Loci number' represents the number of perfectly mapped loci of each degradome sequence, marked at the start nucleotide 
position of each sequence. Normalized reads starting at each nucleotide position in the degradome data are listed in following rows. The histogram 
shows the distribution of the degradome reads. In the example, the target should be cleaved after the 12th nucleotide according to the degradome 
data; (E) Expression change of target gene in smRNA biogenesis pathway gene mutants compared to wild-type plants. Fold changes of the 
normalized expression values between mutants and wildtype are shown in the table and the Y axis of the plot. Data from different datasets are 
distinguished by colors in the plot. 



Target expression in smRNA biogenesis mutants 
As the production of smRNAs will be significantly 
impaired in the mutants of genes involved in plant 
smRNA biogenesis pathways, such as the del, hyll, henl 
and rdr family genes (38), expression increment of genes in 
these mutants may indicate an inhibitory target effect by 
smRNAs. To facilitate inspections from this perspective, 
psRobot collected the published microarray data and 
integrated them in the target prediction results (Fig. 3e). 
Currently, this function is only available for Arahidopsis 



thaliana (GSE2473, GSE301 1, GSE24887) (45,46) and will 
be expanded to other species on the availability of 
required data. 

Performance 

To test the reliability of the smRNA target prediction 
results, we selected 75 Arahidopsis thaliana miRNAs 
with at least one reported target in the ASRP database 
(Supplementary Table S2) (47). A total of 995 genes were 
predicted as the targets of the 75 miRNAs using the 
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default parameters, of which 306 (31%) targets were 
reported by the ASRP database, representing 89% 
of the 344 validated Arabidopsis miRNA targets 
(Supplementary Table SI). Significant improvement of 
the prediction results was achieved by combining the in- 
formation of target site conservation, detectability in 
degradome data and expression change in smRNA bio- 
genesis mutants as filters, as demonstrated by the reduc- 
tion of total predicted targets to 542, of which 292 (54%) 
were confirmed targets from ASRP (85% of validated 
targets) (Supplementary Table SI). 

CONCLUSIONS 

Computational prediction of miRNAs and their targets 
have suffered high false-positive rate because of limited 
constraints to apply. To generate more specific prediction 
results, psRobot integrated the biogenesis and protein as- 
sociation information of plant smRNAs, as well as con- 
servation, cleavage and smRNA dependency information 
for mRNAs. These information can facilitate users to 
quickly identify bona fide miRNAs or other functional 
stem-loop smRNAs and their candidate targets. The 
ability to handle both single or batch sequence input 
and the availability of online and local version of the 
software renders it high flexibility in application. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables 1 and 2. 
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